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Abstract 



Data assimilation methodologies arc designed to incorporate noisy observations of a physical system into an under- 
lying model in order to infer the properties of the state of the system. Filters refer to a class of data assimilation 
algorithms designed to update the estimation of the state in an on-line fashion, as data is acquired sequentially. 

L—\ • For linear problems subject to Gaussian noise, filtering can be performed exactly using the Kalman filter. For 
nonlinear systems filtering can be approximated in a systematic way by particle filters. However in high dimensions 
these particle filtering methods can break down. Hence, for the large nonlinear systems arising in applications 
such as oceanography and weather forecasting, various ad hoc filters are used, mostly based on making Gaussian 
approximations. The purpose of this work is to study the accuracy and stability properties of these ad hoc filters. 
We work in the context of the 2D incompressible Navier-Stokcs equation, although the ideas readily generalize to a 

^ range of dissipative partial differential equations (PDEs). By working in this infinite dimensional setting we provide 

an analysis which is useful for the understanding of high dimensional filtering, and is robust to mesh- refinement. 
We describe theoretical results showing that, in the small observational noise limit, the filters can be tuned to 
perform accurately in tracking the signal itself (filter accuracy), provided the system is observed in a sufficiently 

\^J ■ large low dimensional space; roughly speaking this space should be large enough to contain the unstable modes 

\fs , of the linearized dynamics. The tuning corresponds to what is known as variance inflation in the applied litera- 
ture. Numerical results are given which illustrate the theory. The positive results herein concerning filter stability 

00 

iy~. complement recent numerical studies which demonstrate that the ad hoc filters can perform poorly in reproducing 

_*J ■ statistical variation about the true signal. 

o- 

1. Introduction 



Assimilating large data sets into mathematical models of time-evolving systems presents a major challenge in a 
wide range of applications. Since the data and the model are often uncertain, a natural overarching framework for 
the formulation of such problems is that of Bayesian statistics. However, for high dimensional models, investigation 
of the Bayesian posterior distribution of model state given data is not computationally feasible in on-line situations. 
For this reason various ad hoc filters are used. The purpose of this paper is to provide an analysis of such filters. 

The paradigmatic example of data assimilation is weather forecasting: computational models to predict the 
state of the atmosphere currently involving on the order of O(10 8 ) unknowns but these models must be run with 
an initial condition which is only known incompletely. This is compensated for by a large number, currently on 
the order of 0(1O 6 ), partial observations of the atmosphere at subsequent times. Filters are widely used to make 
forecasts which combine the mathematical model of the atmosphere and the data to make predictions. Indeed the 
particular method of data assimilation which we study here includes, as a special case, the algorithm commonly 
known as 3DVAR. This method originated in weather forecasting. It was first proposed at the UK Met Office in 
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1986_ll|, and was developed by the US National Oceanic and Atmospheric Administration (NOAA) soon thereafter; 
see [2j. More details of the implementation of 3DVAR by the UK Met Office can be found in [3J, and by the 
European Centre for Medium- Range Weather Forecasts (ECMWF) in |4|. The 3DVAR algorithm is prototypical of 
the many more sophisticated filters which are now widely used in practice and it is thus natural to study it. The 
reader should be aware, however, that the development of new filters is a very active area and that the analysis here 
constitutes an initial step towards the analyses required for these more recently developed algorithms. For insight 
into some of these more sophisticated filters see [5|, |g, LD, |8|, |9j and the references therein. 

Filtering can be performed exactly for linear systems subject to Gaussian noise: the Kalman filter [10(. For 



nonlinear or non-Gaussian scenarios the particle filter ll| may be used and provably approximates the desired 



probability distribution as the number of particles is increased 12 |. However in practice this method performs 
poorly in high dimensional systems [13| . Whilst there is considerable research activity aimed at overcoming this 
degeneration [lj, [l5| , the methodology cannot yet be viewed as a provably accurate tool within the context of the 
high dimensional problems arising in geophysical data assimilation. In order to circumvent problems associated 
with the representation of high dimensional probability distributions some form of ad hoc Gaussian approximation 
is typically used to create practical filters, and the 3DVAR method which we analyze here is perhaps the simplest 
example of this. This ad hoc filters may also be viewed in the framework of nonlinear control theory. Proving filter 
stability and accuracy has a long history in this field and the paper [16j is closely related to the work we develop 
here. However we work in an infinite dimensional setting, in order to directly confront the high dimensionality 
of many current real-world filtering applications, and this brings new issues to the problem of establishing filter 
stability and accuracy; overcoming these problems provides the focus of this paper. 

In the paper [17J a wide range of Gaussian approximate filters, including 3DVAR, are evaluated by comparing the 
distributions they produce with a highly accurate (and impractical in realistic online scenarios) MCMC simulation 
of the desired distributions. The conclusion of that work is that the Gaussian approximate filters perform well in 
tracking the mean of the desired distribution, but poorly in terms of statistical variability about the mean. In this 
paper we provide a theoretical analysis of the ability of the filters to estimate the mean state accurately. Although 
filtering is widely used in practice, much of the analysis of it, in the context of fluid mechanics, works with finite- 
dimensional dynamical models. Our aim is to work directly with a PDE model relevant in fluid mechanics, the 
Navier-Stokes equation, and thereby confront the high-dimensional nature of the problem head-on. Study of the 
stability of filters for data assimilation has been a developing research area over the last few years and the paper 



18j | contains a finite dimensional theoretical result, numerical experiments in a variety of finite and (discretized) 
infinite dimensional systems not covered by the theory, and references to relevant applied literature. The paper [19J ] 
gives a review of many important aspects relating to assimilating the evolving unstable directions of the underlying 



dynamical system. Our analysis will build in a concrete fashion on the approach in [20| and [2l| which were amongst 
the first to study data assimilation directly through PDE analysis, using ideas from the theory of determining modes 
in infinite dimensional dynamical systems. However, in contrast to those papers, we will allow for noisy observations 



in our analysis. Nonetheless the estimates in 21( form an important component of our analysis. Furthermore the 
large time asymptotic results in |2l| constitute a limiting case of our theory, where there is no observational noise. 
The presentation will be organized as follows: in Section [2] we introduce the Navier-Stokes equation as the 
forward model of interest and formulate the inverse problem of estimating the velocity field given partial noisy 
observations. This leads to a family of filters for the velocity field which have the form of a non-autonomous 
dynamical system which blends the forward model with the data in a judicious fashion; Theorem 12.31 describes this 
dynamical system via the solution of a sequence of inverse problems. In Section [3] we introduce notions of stability 
and prove the Main Theorems 13. 31 and !3. 71 concerning filter stability and accuracy for sufficiently small observational 
noise. In Section|4]we present numerical results which corroborate the analysis; and finally, in Section[5]we present 
conclusions. 



2. Combining Model and Data 



In subsection 12.11 wc describe the forward model that we employ throughout the remainder of the paper: the 
Navier Stokes equation on a two dimensional torus. Then, in subsection 12. 2[ we describe the observational data 
model that we employ; using this we apply Tikhonov-Phillips regularization to derive the filter which we use to 
combine model and data. Subsection 12.31 contains a specific example of this filter, used later in the paper for our 
numerical illustrations. This example constitutes a particular instance of the 3DVAR method. 

2.1. Forward Model: Navier- Stokes equation 

In this section wc establish a notation for, and describe the properties of, the Navier-Stokes equation. This is the 
forward model which underlies the inverse problem which we study in this paper. We consider the 2D Navier-Stokes 
equation on the torus T 2 := [0,L) x [0, L) with periodic boundary conditions: 

d t u - vAu + u-Vu + \7p = f for all (x, t) e T 2 x (0, oo), 
V-u =0 for all (x,t) G T 2 x (0,oo), 

u(x,0) = uq(x) for all x 6 T 2 . 

Here u: T 2 x (0, oo) — > R 2 is a time-dependent vector field representing the velocity, p: T 2 x (0, oo) — > R is a 
time-dependent scalar field representing the pressure, /: T 2 — >• R 2 is a vector field representing the forcing (which 
we assume to be time- independent for simplicity), and v is the viscosity. In numerical simulations (see section 0]), 
we typically represent the solution via the vorticity w and stream function £; these are related through u = V^£ 
and w = V ■ u, where V 1 " = (82, —di) T . We define 



H := < trigonometric polynomials u : T 



V-u = 0, / u{x)dx = Q\ 

Jt 2 J 



and H as the closure of H with respect to the (L 2 (T 2 )) 2 norm. We define P : (L 2 (T 2 )) 2 -4 H to be the Leray- 
Helmholtz orthogonal projector. 

Given k — [k\, k 2 ) T , define k 1 - := (k 2 , — ki) T . Then an orthonormal basis for H is given by tpk'- K 2 — > R 2 , where 

^(x):=^exp(^^) for k E Z 2 \ {0}. (1) 

Thus for u £ H we may write 

u = ^2 v>k(t)ipk(x) 

feez 2 \{o} 

where, since u is a real-valued function, we have the reality constraint w_fe = —Hk- We then define the projection 
operators P\ : H — > H and Q\ : H — > H by 

P x u= J2 Mt)Mx), Qx = I-P\- 

\2irk\ 2 <\L 2 

We let W x = P X H and W{ = Q X H. 

Using the Fourier decomposition of u, we define the fractional Sobolcv spaces 

H s :=\ueH: ]T (47r 2 |fc| 2 ) s K| 2 < 00 I (2) 

[ kez 2 \{o} J 

1 /9 

with the norm \\u\\ s := {^Zk{^ 2 \k\ 2 ) s \uk\ 2 ') , where set. We use the abbreviated notation ||m|| for the norm on 
H 1 , and | ■ | for the norm on H = H°. 



Applying the projection P to the Navier-Stokes equation we may write it as an ODE (ordinary differential 

This ODE takes the form 

uAu + B(u,u) = f, u{0) = u o . (3) 



equation) in H; see 22j, |23j, [24J for details. This ODE takes the form 

da 



dt 

Here, A = —PA is the Stokes operator, the term B{u,u) = P(u ■ Vu) is the bilinear form found by projecting the 
nonlinear term u ■ Vu onto H and finally, with abuse of notation, / is the original forcing, projected into H. We 
note that A is diagonalized in the basis comprised of the {V'fc}fcez 2 \{o}i on H, and the smallest eigenvalue of A is 
Ai = 4ir 2 /L 2 . The following proposition is a classical result which implies the existence of a dissipative semigroup 



for the ODE Eq. (EH). See Theorems 9.5 and 12.5 in [24J for a concise overview and 



23 



251 for further details. 



Proposition 2.1. Assume that Uq £ H 1 and f £ H. Then Eq. ([3\) has a unique strong solution on t £ [0,T] for 
any T > : 

nil 

u£L°°((0,T);H 1 )nL 2 ((0,T);D(A)), ^ € L 2 ((0,T);H). 

Furthermore the equation has a global attractor A and there is K > such that, if uq £ A, then sup t>0 ||u(t)|| 2 < K. 

We let {\I/(-,£) : H 1 — > H 1 }t>o denote the semigroup of solution operators for the equation Eq. ([3]) through t 
time units. We note that by working with weak solutions, ^(-,t) can be extended to act on larger spaces H s . with 



241. We will, on occasion, use this extension of 



s £ [0, 1), under the same assumption on /; see Theorem 9.4 in 
ty{-,t) to act on larger spaces. 

The key properties of the Navier-Stokes equation that drive our analysis of the filters are summarized in the 



following proposition, taken from the paper 2l(. To this end, define *&(•) = ^>(-,h) for some fixed h > 0. Note 



that the statement here is closely related to the squeezing property 24| of the Navier-Stokes equation, a property 
employed in a wide range of applied contexts. Furthermore, many other dissipative PDEs arc known to satisfy 
similar properties. 

Proposition 2.2. Let u £ A and v £ H 1 . There is (3 = f3(\f\,L,i>) > such that 

||*(«) - *(w)|| 2 < cxp(/3/i)||u - v\\ 2 . (4) 

Now let \\u — i'|| < R and assume that 

9c 8 / 3 (2K\ +iJ^8/3 

Ai 



, ^ 9c 8 / 3 /: 

\ 3 V 



A l 



where c is a dimensionless positive constant and K is the constant appearing in Proposition \2.1\ above. Then there 
exists t* = £*(|/|, L, v, A, R) with the property that, for all h £ (0, £*], there is 7 £ (0, 1) such that 

||Qa(*(«) -*(«)) f <l 2 \\u-vf. (5) 

Proof. The first statement is simply Theorem 3.8 from [21( . The second statement follows from the proof of Theorem 
3.9 in the same paper, modified at the end to reflect the fact that, in our setting, P\5(0) ^ 0. Note also that the 
constant A appearing on the right hand side of the lower bound for A in the statement of Theorem 3.9 in [2l| should 
be Ai (as the proof that follows in [21( shows) and that use of definition of K (see Theorem 3.6 of that paper) allows 
rewrite in terms of K - indeed the proof in that paper is expressed in terms of K. □ 

2.2. Inverse Problem: Filtering 

In this section we describe the basic problem of filtering the Navier Stokes equation Eq. ( [3] ): estimating 
properties of the state of the system sequentially from partial, noisy, sequential observations of the state. Throughout 
the following we write N for the natural numbers {1,2,3, . . .}, and Z + := N U {0} for the non-negative integers 
{0, 1, 2, 3, . . .}. Let H be the Hilbcrt spaces with inner product (•, •) and norm | • | defined via Eq. ([2]) with s = 0. If 



C l is a self- adjoint, positive-definite compact operator on H , which therefore has a symmetric square root C 1 ' 2 , 
we define 

<v>£ :=(',^ 1 -), II • \\c ■■= It' 1 ' 2 • I- 

Recall that we have defined $(•) = \l/(-, ft.) for some fixed ft > 0. We let X denote H 1 and define {uj}j £ %+, Uj E X 
by0 

u j+ i ■= *(«,-)• (6) 



This discrete dynamical system is well-defined by virtue of Proposition [2T2l Thus Uj = u(jh) where u is the solution 
ofEq. (EI). 

Our interest is in determining Uj from noisy observations of P\Uj. We now let {£j}jeN be a noise sequence in 
W\ which perturbs the sequence {P\Uj\jg® to generate the observation sequence {j/jljeN in W\ given by 

Vj+i : = p \u j+ i + £,j+i, j € Z + . (7) 

This observation operator allows us to study partially observed infinite dimensional systems in a clean fashion and 
the resulting analysis will be a useful building block for the study of other partial observations such as pointwise, 
or smoothed, observations on a regular grid in physical space. 

We let Yj = {yi} 3 i=1 , the accumulated data up to time t = jh. We assume that u$ is not known exactly. The 
goal of filtering is to determine the state Uj from the data Yj . The approach to the problem of blending data and 
model that we study here is based on Tikhonov-Phillips regularization. We find a point which represents the best 
compromise between information given by the model and by the data. More precisely, we use the model to provide 
a regularization of a least squares problem designed to match the data. This will result in a sequence {uj}jgz+ 
which approximates the true signal {Mj}jez+ gi vm g r i se to the data. To define the sequence {%}jez+ we introduce 
two sequences of operators {rj}jgN, and {CjjjgN which will be used to weight the contributions of model and data 
at discrete time j. Then define 

ijM = l\\yi+i-PM\h + , + lh-*(uMh +1 - (8) 

Choosing a minimizer of the functional encodes the idea of a compromise between the model output ^(uj) and the 
data Uj+i, to estimate the state of the system at time t = (j + l)ft. The operators Cj+\ and Tj+i give appropriate 
weights on the two sources of information. With this in mind we set 

Uj+i = arginf uGH i/ 7 (u). (9) 

The following theorem shows that this is justified: 

Theorem 2.3. Assume that Cj+i and r j+ i are positive-definite self-adjoint operators on H and P\H respectively, 
that Cj+i is a bounded operator from H l into itself and that the norm \\ ■ \\c +1 is equivalent to the H r norm for 
some r > 1. Then, ifuo e H , the iteration @ 0) determines a unique sequence {uj}jgN in H 1 given by 

Uj+i = (/ - K j+1 )V(uj) + K ]+m+1 (10) 

where 

K j+1 = C J+1 P* X (r,-+i + PxC J+1 P* x ) ^Px. (11) 



1 With abuse of notation, subscripts j will indicate times, while subscripts k will denote Fourier coefficients as before in order to 
avoid confusion. The meaning should also be clear in context. 



Proof. The proof follows by a simple induction. Assume that tij G H 1 . Then let u = &(uj) + v and note that the 
functional Jj : H r — > R + given by 

JM = \\\y j+ x - Px{*(u 3 ) + v)\\l j+1 + \\\v\\l +1 (12) 

is convex and weakly lower semi-continuous and hence has a unique minimizcr v G H r . Thus Uj+i = v + Uj is in 
H 1 since Uj G H 1 and v G H r with r > 1. Equations (|101 lll[) for the minimizer follow from standard theory of 
calculus of variations. □ 

In the case where \&(-) is a linear map, the matrix Kj + \ is the Kalman gain matrix [10| . composed with the 
observation operator Pa- We will also study the situation where complete observations are made, obtained by taking 
A — > oo in the preceding analyses. The observations are given by 

yj-Uj + Sj, jez+ (13) 

where now 2/j,£j G H. The operator Tj is now a positive self-adjoint operator on H. The filter that we study takes 
the form (|TU1) with P\ replaced by the identity so that (fTTj) is replaced by 

K j+l =C j+1 (T j+l +C j+l ) \ (14) 

If we define 

B j= I- K j+1 (15) 

then Theorem 12.31 yields the key equation 

u J+1 = Bj^uj) + (I - B s )y i+1 . (16) 

This equation and Eq. ( [TU] ) demonstrate that the estimate of the solution at time j + 1 is found as an operator- 
convex combination of the true dynamics applied to the estimate of the solution at time j, and the data at time 
j + 1. We will favor use of Bj in what follows, rather than ifj+i, as Bj is the operator which controls the stability, 
and hence accuracy, of the estimate. 

This problem can be given a Bayesian formulation in which the probability distribution of Uj\Yj is the primary 
object of interest. In practice, for high dimensional systems arising in applications in the atmospheric sciences, 
various ad hoc Gaussian approximations are typically used to approximate these probability distributions; the 
reader interested in understanding the derivation of filters from this probabilistic perspective is directed to 



261 for 



details; this unpublished technical report is an expanded version of the material contained here, to incorporate the 
probabilistic perspective. The mean of a Gaussian posterior distribution has an equivalent formulation as solution 
of a Tikhonov-Phillips regularized least squares problem, as we have here, and this perspective on data assimilation 



is adopted in [27[ . In [27( linear autonomous dynamical systems are studied and filter accuracy and stability results 



similar to ours are derived. 

It is demonstrated numerically in [171 ] that the Gaussian approximations are, in general, not good approximations. 
More precisely, they fail to accurately capture covariance information. However, the same numerical experiments 
reveal that the methodology can perform well in replicating the mean, if parameters are chosen correctly, even if 
it is initially in error. Indeed this accurate tracking of the mean is often achieved by means of variance inflation 
- increasing the model uncertainty, here captured in the exogenously imposed Cj, in comparison with the data 
uncertainty, here captured in the Tj. The purpose of the remainder of the paper is to explain, and illustrate, this 
phenomenon by means of analysis and numerical experiments. 



2.3. Example of a Filter: 3DVAR 

The algorithm described in the previous section yields the well-known 3DVAR method, discussed in the intro- 
duction, when Cj = C and Tj = T for some fixed operators C, I\ To impose commutativity with A, we assume that 
the operators T and C are both fractional powers of the Stokes operator A, in W\ and H respectively. We choose 
A$ = £A (the parameter I forms a useful normalizing constant in the numerical experiments of section 0]) and set 
C = S 2 Aq ^ in H and T = a 2 A^ in W\. Substituting into the update formula Eq. ( \TE\ ) for zij and defining 
j] = a/5, a = C-P, B (v) = (I + rj 2 ' A 2 a )- 1 rj 2 A 2 ," in W\ then in dHJ) we have Bj = B : W\ X W£ ->■ W\ X W£ the 
constant operator 

Using this we obtain the mean update formula 

u j+l = B^>(u j ) + (I-B)y j+1 . (18) 

Notice that for C,T given as above, the algorithm depends only on the three parameters A, a and r\, once the 
constant of proportionality I in Aq is set. The parameter A measures the size of the space in which observations 
are made; for fixed wavevector k, the parameter r\ is a measure of the scale of the uncertainty in observations to 
uncertainty in the model; and the sign of the parameter a determines whether, for fixed 77 and asymptotically for 
large wavevectors, the model is trusted more (a > 0) or less (a < 0) than the data. This can be seen by noting 
that if a > then Bip k — > ip k , while if a < then Bip k — > 0." 

In the case A = 00, the case of complete observations where the whole velocity field is noisily observed, we again 
obtain Eq. ([18]), with B = B a {rf) = (I + rj 2 A 2 ) a )^ 1 rj 2 A 2 l a in H. The roles of 77 and a are the same as in the finite 
A (partial observations) case. 

The discussion concerning parametric dependence with respect to varying 77 shows that, for the example of 
3DVAR introduced here, and for both A finite and infinite, variance inflation, which refers to reducing faith in the 
model in comparison with the data, can be achieved by decreasing the parameter r\. We will show that variance 
inflation does indeed improve the ability of the filter to track the signal. 

3. Accuracy and Stability 

In this section we develop conditions under which it is possible to prove stability of the nonautonomous dynamical 
system defined by the mean update equation Eq. ([16]) and show that, after a sufficiently long time, the true signal 
is accurately recovered. By stability we here mean that two filters driven by the same noise observation will converge 
towards the same estimate of the solution. By accuracy we mean that when the noise perturbing the observations 
is 0(e), the filter will converge to an 0(e) neighbourhood of the true signal, even if initially it is an 0(1) distance 
from the true signal. In subsection 13 . 1 1 we study the case of partial observations; subsection 13 . 21 contains the (easier) 
result for the case of complete observations. The third subsection 13.31 shows how our results can be applied to 
the specific instance of the 3DVAR algorithm introduced in subsection 12. 3[ for any a £ R, provided 77, which is a 
measure of the ratio of uncertainty in the data to uncertainty in the model, is sufficiently small: this, then, is a 
result concerning variance inflation. 



For simplicity, we will assume a "truth" which is on the global attractor, as in [21|. This is not necessary, but 
streamlines the presentation as it gives an automatic uniform in time bound in H . Recall that || ■ || denotes the 
norm on H , and | • | the norm on H; similarly we lift || • || to denote the induced operator norm on H l — > H . 

It is useful to recall the filter in the form (JTHJ): 

u j+1 = B^iuj) + (I- B,)y J+1 . (19) 



It is also useful to consider a second filter driven by the same data {yj}j£Z+, but possibly started at a different 
point: 

w j+1 = B^(wj) + (I- BjOVj+i- (20) 

3.1. Main Result: Partial Observations 

In this case we will see that it is crucial that the observation space W\ is sufficiently large, i.e. that a sufficiently 
large number of modes are observed. This, combined with the contractivity in the high modes encapsulated in 



Proposition 12.21 from [2l| , can be used to ensure stability if combined with variance inflation. We study filters of 
the form given in Eq. ([16]) and make the following assumption on the observations {ijj}. 

Assumption 3.1. Consider a sequence Uj = u(jh), where u(t) is a solution of Eq. (\^) lying on the global attractor 
A. Then, for some A G (A l7 oo), 

V] = P \ U 3 + & 

for some sequence £j satisfying sup ; j >1 ||£j|| < e. 

Note that this assumption, concerning uniform boundedness of the noise, is not verified for the i.i.d. Gaussian 
case. However we do expect that a more involved analysis would enable us to handle Gaussian noise, at the expense 



of proving results in mean square, or in probability. Indeed in 28| we study a continuous time limit of the set-up 
contained in this paper, in which white noise forcing arises from the i.i.d. Gaussian noise; accuracy and stability 
results can then indeed be proved in mean square and in probability. However, we believe that, for clarity, the 
assumption made in this paper enables us to convey the important ideas in the most straightforward fashion. 

We make the following assumption about the family {Bj}, and assumed dependence on a parameter i] G IR + . 
Recall that the inverse of r\ quantifies the amount of variance inflation. 

Assumption 3.2. The family of positive operators {Bj{if)\ H 1 — >• H 1 }j>± commute with A, satisfy sup,, >x ||-Bj(?7)|| < 
I, and sup ?>1 \\I — Bj(r))\\ < b for some b G R + , uniformly with respect to r\. Furthermore, (l — Bj(i]))Q\ = and 
there is, for all A > Ai, constant c = c(A) > such that sup..^ ||P\-Bj(??)|| < erf. 

We now study the asymptotic behaviour of the filter under these assumptions. 

Theorem 3.3. Let Assumptions [3~l\ and \3~M hold, choose any uq,wq £ Bj^i (u(0),r) and let (X*,t*) be as given in 
Proposition \2. C A Assume that A > A*. Then for any h G (0,i*] there is 77 sufficiently small so that the sequences 
{uj}j>0, {wj}j>o given by Eq. (\TB\), Eq. (\2U\) satisfy, for some a G (0,1), 

|| Wj ~Wj\\ < a 3 \\u Q -w Q \\ 

and 

i-i 

\\uj - Uj\\ < a?r + 2be'S^a t . 

i=0 

Hence 

limsup \\Uj — Uj\\ < e. 

j — ^00 r (2 

Proof. We prove the second, accuracy, result concerning \\uj — Uj\\. The stability result concerning \\v,j — Wj\\ is 
proved similarly. Assumption 13.21 shows that yj+i = P\^>(uj) + £j+i. Recall that in Eq. ( HH] ) yj+\ has been 
extended to an element of H, by defining it to be zero in W£, and we do the same with £-/+i- Substituting the 
resulting expression for yj+i in Eq. ( [16] ) we obtain 

u J+1 = B^iuj) + (I - B^P^iuj) + (I- B^+i 

but since (/ — Bj)Q\ = by assumption we have 

u j+1 = B^lfij) + (I- B,-)*(%) + (I - Bj)Zj+i- (21) 



Note also that 

Uj+i = Bj&(uj) + (I - Bj)$(uj). 

Subtracting gives the basic equation for error propagation, namely 



l 3 + l 



i, +1 = £,(*(%) - *(«,-)) + (I- B^ j+1 . (22) 



Since A > A* the second item in Proposition 12.21 holds. Fix a £ (7,1) where 7 is defined in Proposition 
Assume, for the purposes of induction, that 

j-i 
\\tij - Uj\\ < a 3 r + 2be V" a 1 . 

i=0 

Define R = 2r noting that the inductive hypothesis implies that, for e sufficiently small, \\uj— u 3 -|| < r+2b(l — a)~ 1 e < 
R. Applying P\ to Eq. ( [22] ) and using Eq. ( 2] ) gives 

\\P x (u j+1 -u j+1 )\\ < ||PaBj|||| (*(%)-*(%)) II + \\Px(I - Bj)\\e 

< c(A)?7 2 exp(/3/i/2)||u i - Uj\\ + be. 

Applying Q\ to Eq. ( [52] ) and using Eq. ( [5] ) giveqj 

|[Q A (S i+ i-« i+ i)|| <||B i ||||gA(*(S i )-*(«i))|| + ||QA(/--Bi)||e 

< 7||uj - Uj\\ + be. 

1 
Now note that, for any w G H 1 , \\w\\ = (||_Paw|| 2 + ||<5aio|| 2 ) 2 < ||-P\w|| + ||QaHI- Thus, by adding the two previous 
inequalities, we find that 

||tt,- +1 -« i+ i|| < (c(\)r 1 2 exp(f3h/2)+~f)\\u J ~u J \\+2be. 

Since 7 € (0, 1) and a 6 (7, 1), we may choose rj sufficiently small so that 

||%+i — Uj+ill < a\\uj — Uj\\ + 2be. 

and the inductive hypothesis holds with j 1-4 j + l . Taking j —y 00 gives the desired result concerning the limsup. □ 

Remark 3.4. Note that the proof exploits the fact that Bj^/(-) induces a contraction within a finite ball in H 1 . 
This contraction is established by means of the contractivity of Bj in W\, via variance inflation, and the squeezing 
property ofty(-) in W£, for large enough observation space, from Proposition \2. 2\ 

There are two important conclusions from this theorem. The first is that, even though the solution is only observed 
in the low modes, there is sufficient contraction in the high modes to obtain an error in the entire estimated state 
which is of the same order of magnitude as the error in the (low mode only) observations. The second is that this 
phenomenon occurs even when the initial estimate suffers from an 0(1) error. Of course this result also shows that, 
if the true solution starts in an 0(e) neighbourhood of the truth, then it remains there for all positive time. 

3.2. Main Result: Complete Observations 

Here we study filters of the form given in Eq. ( [16] ) with observations given by Eq. ( [13] ) . In this situation the 
whole velocity field is observed and so, intuitively, it should be no harder to obtain stability than in the partially 
observed case. The proof is in fact almost identical to the case of partial observations, and so we omit the details. 
We observe that, although there is no parameter A in the problem statement itself, it is introduced in the proof: as 
in the previous subsection, see Remark 13.41 the key to stability is to obtain contraction in W? using the squeezing 
property of the Navicr-Stokcs equation, and contraction in W\ using the properties of the filter to control unstable 
modes. 

We make the following assumptions: 



The term be on the right-hand side of the final identity can here be set to zero because (/ — Bj)Q\ = 0; however in the analogous 
proof of Theorcm l3,7l it is present and so we retain it for that reason. 



Assumption 3.5. Consider a sequence Uj = u(jh) where u(t) is a solution of Eq. (W[) lying on the global attractor 
A. Then 

Vj = u j + & 
for some sequence £j satisfying sup J>1 \\£j\\ < e. 

Assumption 3.6. The family of positive operators {Bj(rj): H 1 — > H l }j>\ commute with A sup ?>1 ||-Bj(??)|| < 1, 
and sup ■ >1 \\I — Bj(r/)\\ < b for some b E R + , uniformly with respect to rj. Furthermore, for all A > Ai, there is a 
constant c = c(A) > such that sup y>1 ||P\-Bj(tj)|| < c?/ 2 . 

We now study the asymptotic behaviour of the filter under these assumptions. 

Theorem 3.7. Let Assumvtions [3~5\ and [3~d\ hold and choose any uq,wq E M H i(u(0),r). Then for any h E (0,£*], 
with t* given in Proposition \2. ffl there is r\ sufficiently small so that the sequences {uj}j>o, {^j}j>o given by Eq. ( 
HE), Eq. (\M) satisfy, for some a £ (0, 1), 

||% -Wj\\ < a^uo-woW 

and 

j-i 

\\uj - Uj || < a j r + 2be V] a\ 

8=0 

Hence 

hmsup \\Uj — Uj\\ < e. 

j — ^oo ^ ^ 

Proof. The proof is nearly identical to that of Theorem 13.31 Differences arise only because we have not assumed 
that (I — Bj)Q\ = 0. This fact arises in two places in Theorem 13.31 The first is where we obtain Eq. ( [21] ). 
However in this case we directly obtain Eq. ( [21] ) since the whole velocity field is observed. The second place it 
arises is already dealt with in the footnote appearing in the proof of Theorem 13.31 when estimating the contraction 
properties in W£; there we indicate that the proof is already adjusted to allow for the situation required here. □ 

Remark 3.8. 7/sup ?>1 ||Sj(?y)|| < crj 2 then the proof may be simplified considerably as it is not necessary to split 
the space into two parts, W\ and W£. Instead the contraction of Bj can be used to control any expansion in *&(•), 
provided r\ is sufficiently small. 

3.3. Example of Main Result: 3DVAR 

We demonstrate that the 3DVAR algorithm from subscction l2.3l satisfies Assumptions l3.2l and l3.6l in the partially 
and completely observed cases respectively, and hence Theorems 13.31 and 13.71 respectively may be applied to the 
resulting filters. In particular, the filters will locate the true signal, provided -q is sufficiently small. Satisfaction of 
Assumptions 13.21 and 13.61 follows from the properties of 

B ( V ) = (I + r 1 2 A 2 a )- 1 r, 2 A 2 a , I - B (n) = (I + r, 2 A 2a )-\ 

Note that the eigenvalues of B (rj) are 

77 2 (4^ 2 |fc| 2 ) 2Q 



l+77 2 (4^7T 2 |fc| 2 ) 2a ' 



if Aq = I A. Clearly the spectral radius of Bo(rj) is less than or equal to one on W\ or H, independently of the sign 
of a. The difference is just that \k\ 2 < A/Ai in the former, and |fc| is unbounded in the latter. 

First we consider the partially observed situation. We note that Bj = B and is given by Eq. ( [17] ) : 

B= ( (i+^ArrwAr o \ (23) 
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so that the Kalman gain-like matrix I — B is given by 

From this it is clear that (I — B)Q\ = 0. Furthermore, since the spectral radius of Bo(r/) does not exceed one, the 
same is true of B. Hence for the operator norms from H 1 into itself we have ||B|| < 1. Similarly, if a < then 

b : = || J - B\\ = 1, whilst if a > then b = (l + n 2 {(.\i) 2a \ < 1. Thus Theorem [37J applies. Note that P X B = B Q 
and that B Q = 0(if). 

In the fully observed case we simply have Bj = B where B = Bo(n) defined above on H. Again ||B|| < 1 and 

if a < then \\I - B\\ = b = 1, whilst if a > then b = (l + ?7 2 (£Ai) 2q ) < 1. Thus Theorem [3J] applies. Note 
(see Remark l3.8p . that if a < then the proof of that theorem could be simplified considerably because ||S|| < 1 
and in fact sup J>1 ||-B|| < c?7 2 . 



Remark 3.9. We observe that the key conclusion of Theorems \3.3\ and \3.7\ is the asymptotic accuracy of the 
algorithm, when started at distances of 0(1). The asymptotic bound, although ofO(e), has constant -^- which may 
exceed 1 and so the bound may exceed the error obtained by simply using the observations to estimate the signal. 
Our numerics will show, however, that in practice the algorithm gives estimates of the state which improve upon 
the observations. 

4. Numerical Results 

In this section we describe a number of numerical results designed to illustrate the range of filter stability 
phenomena studied in the previous sections. We start, in subsection 14. 1[ by describing two useful bounds on the 
error committed by filters; we will use these guides in the subsequent numerics. Subscction l4.2l dcscribcs the common 
setup for all the subsequent numerical results shown. Subsection 14.31 describes these results in the case of complete 
observations in discrete time, whilst Subsection [474] extends to the case of partial observations, also in discrete time. 

Our theoretical results have been derived under Assumptions [3~T1 and [3~5l on the errors. These are incompatible 
with the assumption that the observational noise sequence is Gaussian. This is because i.i.d Gaussian sequences 
will not have finite supremum. However, in order to test the robustness of our theory we will conduct numerical 
experiments with Gaussian noise sequences. 

4-1- Useful Error Bounds 

We describe two useful bounds on the error which help to guide and evaluate the numerical simulations. To 

derive these bounds we assume that the observational noise sequence £j is i.i.d with E£j = and E£j (g> £_, = T. 

Then 

E|£,f=tr(r)=5> fe 
k 

where {gk} are the eigenvalues of the operator F. 

• The lower bound is derived from (|22p . Using the assumed independence of the sequence we see that 

E|u i+ i - u J+1 \ 2 > E|(7 - B^+il 2 = tr((7 - Bj)T(I - B,-)*) (25) 

• The upper bound on the filter error is found by noting that a trivial filter is obtained by simply using the 
observation sequence as the filter mean; this corresponds to setting Bj = in ((T5)) . For this filter we obtain 

E\u j+1 - u j+1 1 2 < E|&+i | 2 = tr(r) (26) 
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in the case of complete observations, and 

E|u 3 -+i - u ]+1 \ 2 < m 3+1 \ 2 + |Oa^+i| 2 = tr(r) + |Oa^ + i| 2 (27) 

in the case of incomplete observations. 

Although the lower bound (|25|) does not hold pathwise, only on average, it provides a useful guide for our 
pathwise experiments. The upper bounds (|26|) and (|27[) do not apply to any numerical experiment conducted with 
non-zero Bj, but also serve as a useful guide to those experiments: it is clearly undesirable to greatly exceed the 
error committed by simply trusting the data. We will hence plot the lower and upper bounds as useful comparators 
for the actual error incurred in our numerical experiments below. We note that, for the 3DVAR example from 
subsection 12.31 with complete observations, the upper and lower bounds coincide in the limit r\ — > as then B — > 0. 
For partial observations they differ by the second term in the upper bound. 

4-2. Experimental Setup 

For all the results shown we choose a box side of length L — 2. The forcing in Eq. (j3]) is taken to be / = V x ^, 
where ip — cos(7rfc • x) and V = JV with J the canonical skew-symmetric matrix, and k = (5,5). The method 



used to approximate the forward model §3§ is a modification of a fourth-order Rungc-Kutta method, ETD4RK [29(, 
in which the Stokes semi-group is computed exactly by working in the incompressible Fourier basis {ipk(x)} fcez 2 \{o} 
in Eq. ([1]), and Duhamcl's principle (variation of constants formula) is used to incorporate the nonlinear term. 
Spatially, a Galerkin spectral method |30j is used, in the same basis, and the convolutions arising from products 
in the nonlinear term are computed via FFTs. We use a double-sized domain in each dimension, buffered with 
zeros, resulting in 64 2 grid-point FFTs, and only half the modes in each direction are retained when transforming 
back into spectral space in order to prevent aliasing, which is avoided as long as fewer than 2/3 of the modes are 
retained. 

The dimension of the attractor is determined by the viscosity parameter v. For the particular forcing used there 



is an explicit steady state for all v > and for v > 0.035 this solution is stable (see [3l|, Chapter 2 for details). As v 
decrease the flow becomes increasingly complex and the regime v < 0.016 corresponds to strongly chaotic dynamics 
with an upscale cascade of energy in the spectrum. We focus subsequent studies of the filter on a strongly chaotic 
[y = 0.01) parametric regime. For this small viscosity parameter, we use a time-step of St — 0.005. 

The data is generated by computing a true signal solving Eq. ( [3] ) at the desired value of v, and then adding 
Gaussian random noise to it at each observation time. Such noise does not satisfy Assumption 13.51 since the 
suprcmum of the norm of the noise sequence is not finite, and so this setting provides a severe test beyond what 
is predicted by the theory; nonetheless, it should be noted that Gaussian random variables only obtain arbitrarily 
large values arbitrarily rarely. 

All experiments are conducted using the 3DVAR setup and it is useful to reread the end of subsection 12.31 in 
order to interpret the parameters a and r\. We consider both the choices a = ±1 for 3DVAR, noting that in the case 
a. = — 1 the operator B has norm strictly less than one and so we expect the algorithm to be more robust in this 
case (see Remark I3T51 for discussion of this fact). For all experiments we set £ = A^ 1 which ensures that the action 
of Aq", and hence B, on the first eigenfunction is independent of the value of a; this is a useful normalization when 
comparing computations with a = 1 and a = — 1. 

We set the observational noise to constant white noise Tj =T = a 2 I (i.e. j3 = in section [273]) . Here a = 0.04, 
which gives a standard deviation of approximately 10% of the maximum standard deviation of the strongly chaotic 
dynamics. Since we are computing in a truncated finite-dimensional basis the eigenvalues are summable; the 
situation can be considered as an approximation of an operator whose eigenvalues decay rapidly outside the basis 
in which we compute. 
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To be more precise regarding the algorithm, let U denote the finite-dimensional spectral representation, which 
is a complex-valued vector of dimension 32 2 , including redundancy arising from reality and zero mass constraints. 
The computation of B in Eq. ([3]) requires padding this with zeros to a 64 2 vector, computing inverse FFTs on 
the discretization of the 6 fields u i: u it j for i,j g {1,2}, performing products, computing FFTs on the 2 resulting 
(discrete) spatial fields, and finally discarding the now-populated padding modes. Denoting the discrete map from 
time t to time t + s by 4> S (M ), the experiments proceed precisely as follows: 

• Evolve Ut = $t(£/o) until statistical equilibrium, as judged by observing the energy fluctuations E\U t {t)] = 
\\Ut\\2- Set Uq = Ut so that the initial condition is on the attractor. 

• Compute the observations Yj = &jh(Uo) +Af(0,T). 

• Draw Uq ~ A/"(0, kA 2 "), where k>1 [this is essentially arbitrary, as long as the initial condition is something 
sensible and such that \\Uo — foil = 0(1)}. 



• Compute B, and I — B as given in Equations 

• For j = 1, . . . , J : Compute U, = B$ fc (E/,-_i) + (I - B)Yj. 

4-3. Complete Observations 

We start by considering discrete and complete observations and illustrate Theorem 13.71 and in particular the 
role of the parameter r\. The experiments presented employ a large observation increment of h = 0.5 = lOOSt. For 
a = 1 we find that when r\ = a (Fig. [T|) the estimator stabilizes from an initial 0(1) error and then remains stable. 
The upper and lower bounds are satisfied (the upper bound after an initial rapid transient), and even the high 
modes, which are slaved to the low modes, synchronize to the true signal. For r\ = 10er (Fig. [2]) the estimator 
fails to satisfy the upper bound, but remains stable over a long time horizon; there is now significant error in the 
k = (7, 7) mode, in contrast to the situation with smaller r\ shown in Fig. [T] Finally, when r\ = lOOer (Fig. [3]), the 
estimator really diverges from the signal, although still remains bounded. 

When a = — 1 the lower and upper bounds are almost indistinguishable and, for all values of 77 examined, the 
error either exceeds or fluctuates around the upper bound; see Figures 21 [5] and [5] where n = a, 10a and lOOer 
respectively. It is not until r\ = lOOer (Fig. [5]) that the estimator really loses the signal. Notice also that the high 
modes of the estimator always follow the noisy observations and this could be undesirable. For both r\ = lOOer 
and lOcr, the a = — 1 estimator performs better than the one for a = 1 in terms of overall error, illustrating the 
robustness alluded to in Remark |XH since for a < we have \\B\\ < 1. However, an appropriately tuned a = 1 filter 
has the potential to perform remarkably well, both in terms of overall error and individual error of all modes (see 
Fig. [I] in contrast to Fig. 2|). In particular, this filter has an expected error substantially smaller than the upper 
bound, which docs not happen for the case of a = — 1 when complete observations are assimilated. 

4-4- Partial Observations 

We now proceed to examine the case of partial observations, illustrating Theorem 13.31 Note that the forced 
mode has magnitude |fc/| 2 = 50, so ensuring that it is observed requires that A > 50Ai. When enough modes 
are retained, for example when A = lOOAi in our setting, the results for the a = 1 case remain roughly the same 
and are not shown. However, in the case a = — 1, in which the observations are trusted more than the model at 
high wavevectors, the results are greatly improved by ignoring the observations of the high-frequencies. See Fig. 
[7J This improvement, and indeed the improvement beyond setting B — for both cases a = ±1 disappears as 
A is decreased. In particular, when A = 25Ai the error is never very much smaller than the upper bound. This 
is due to the fact that the dynamics of the low wavevectors tend to be unpredictable and they contain very little 
useful information for the assimilation. Then, for much smaller A = 4Ai, once enough unstable modes are left 
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unobserved, there is no convergence. The order of magnitude of the error in the asymptotic regime as a function of 
•q remains roughly consistent as A is decreased until the estimator no longer converges. For small h (high-frequency 
in time observations) and complete observations, the estimator can be slow to converge to the asymptotic regime. 
In this case, the number of iterations until convergence, for a sufficiently small ij, becomes significantly larger as A 
is decreased (again until the estimator fails to converge at all). 

Given A w fcf Ai, we expect that for r\ sufficiently small the contribution of the model to the filter will be 
negligible for all k with \k\ < k\ for a = 1. Hence the estimators for both a = ±1 will behave similarly. An example 
of this is shown in Fig. [8] where r\ = O.Olcr = 0.0004 and A = 49Ai in Fig. [8] In both cases, the estimator is 
essentially utilizing all the available observations. There are enough observations to draw the higher wavevectors of 
the estimator closer to the truth than if we just set the population of those modes to zero. In contrast, as mentioned 
above, when A = 25 Ai, there are not enough observations even when they are all used, and the error is roughly the 
same as the upper bound as rj — > (not shown). 

5. Conclusion 

This paper contains three main components: 

• we show that the filtering problem for the Navier-Stokes equation may be formulated as a a sequence of 
well-posed inverse problems: Theorem [ 



• we prove Theorems 13.31 and 13. 7\ which establish filter accuracy and stability provided variance inflation is 
employed; 

• we describe numerical results which illustrate, and extend the validity of, the theory. 

We note that the analysis will also apply to other semilinear dissipative PDEs which possess a squeezing property 
(contraction when projected into the high modes) and a global attractor; such equations are studied in depth in 
[25j, and include the Cahn- Allen and Cahn-Hilliard equations, the Kuramoto-Sivashinsky equation and Ginzburg- 
Landau equation. These two structural properties of the underlying model, when combined with sufficient variance 
inflation {q small enough in 3DVAR) enable proof that the 3DVAR filter has a contractive property, even when the 
underlying dynamical system itself has positive Lyapunov exponents. 

There are a number of natural future directions which stem from this work: 

• to develop analogous filter stability theorems for more sophisticated filters, such as the extended and ensemble- 
based methods, when applied to the Navier-Stokes equation; 

• to study model-data mismatch by looking at filter stability for data generated by forward models which differ 
from those used to construct the filter; 

• to study the effect of filtering in the presence of model error, by similar methods, to understand how this may 
be used to overcome problems arising from model-data mismatch. 



• 



to combine the analysis in this paper, which concerns nonlinear problems, but assumes that the observation 



operator and the Stokes operator commute, and the recent work [271 ] which concerns only linear autonomous 
problems but does not assume commutativity of the solution operator for the forward model and the obser- 
vation operator. 
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Figure 1: Example of a stable trajectory for 3DVAR with v = 0.01, h = 0.5, r\ = a = 0.04, a = 1. The top left plot shows the norm- 
squared error between the estimated mean, m(t n ) = m n , and the signal, u(tn), in comparison to the preferred upper bound (i.e. the 
total observation error tr(r) = S) and the lower bound tr[(7 — B n )T(I — B n )\. The other three plots show the estimator, m(t), together 
with the signal, n(t), and the observations, y n for a few individual modes. 
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Figure 2: Example of a destabilized trajectory for 3DVAR with the same parameters as in Fig. [T]cxcept the larger value of r\ = 10cr = 0.4. 
Panels are the same. 
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Figure 3: Example of a destabilized trajectory for 3DVAR, with the same parameters as in Fig. [T]except the larger value of r) = lOOcr = 4. 
Panels are the same. 
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Figure 4: Example of a stable trajectory for 3DVAR with the same parameters as in Fig. [T]cxcept with a = —1. Panels are the same. 
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Figure 5: Example of a stable trajectory for 3DVAR with the same parameters as in Fig. [2] except with value of a = — 1. Panels are 
the same. 
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Figure 6: Example of a destabilized trajectory for 3DVAR with the same parameters as in Fig. [3] except with value of < 
are the same. 
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Figure 7: Example of an improved estimator for partial observations with A = 100Ai and otherwise the same parameters as in Fig. \E\ 
Panels are the same. 
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Figure 8: Examples of estimators for partial observations with A = 49AJ and r] = O.lcr = 0.004, otherwise the same parameters as in 
Figs. [T]and|4] Left panels are for a = 1 and right panels are for a = — 1. 
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